fix(esc): mid-stream cancellation for OpenAI-compatible + Minimax providers by ericleepi314 · Pull Request #145 · agentforce314/clawcodex

ericleepi314 · 2026-05-15T17:06:43Z

Summary

PR fix(esc): cancel streaming API call when abort signal trips #144 added abort-signal-aware streaming for AnthropicProvider, but OpenAICompatibleProvider and MinimaxProvider got only the pre-call fast-path. Users on LiteLLM (OpenAI-compatible) → Claude — the user who reported the lingering 10s ESC pause is on this exact stack — still saw ESC wait the full model latency before the outer query loop's abort check fired.
Port the response-close listener pattern from fix(esc): cancel streaming API call when abort signal trips #144 to both providers. Mid-stream ESC now unwinds within ~50ms regardless of which provider is in use.

User-reported symptom

After #144 merged, a user testing with LiteLLM → Anthropic Claude Opus 4.7 reported ESC still took ~10s during the model's "Thinking…" phase. The call path went through openai_compatible.py (LiteLLM exposes an OpenAI-compatible API), not anthropic_provider.py, so #144's listener never registered. This PR closes that gap.

Changes

src/providers/openai_compatible.py — two defenses in chat_stream_response:
- Response-close listener: registered on abort_signal via add_listener(..., once=True). Calls stream.response.close() to close the underlying httpx socket. The SDK's blocking next-chunk read raises immediately. Handles the user's exact case (long gap between chunks during extended thinking / tool_use generation).
- In-loop abort check: if abort_signal.aborted: break at the top of each for chunk in stream: iteration. Catches the SDK-prefetched-chunks case where the listener's close lands one iteration late.
- Register-then-recheck race-safe ordering (matches fix(esc): cancel streaming API call when abort signal trips #144's pattern).
- Signal-state-authoritative exception translation in the except Exception block.
- finally block detaches the listener so long-lived controllers don't accumulate listeners.
src/providers/minimax_provider.py — Minimax uses the anthropic SDK against its compatible endpoint, so it gets the AnthropicProvider treatment (response-close listener; no in-loop check needed because the with ... as stream: only exposes text_stream).
tests/test_openai_compat_abort_signal.py (new, 6 tests):
- Pre-abort fast-path skips client.chat.completions.create (leaf-level assert_not_called())
- Mid-stream close: synthetic 500ms stream + mid-stream trip returns within 1s, verifies stream.response.close() was called
- Load-bearing in-loop check: passes an on_text_chunk callback, asserts seen == ["first"] not ["first", "second"]. Mutation-tested by deleting the in-loop check and watching the test fail with the exact expected message.
- Normal-completion regression check
- abort_signal=None legacy parity
- Listener detachment after normal completion
tests/test_minimax_abort_signal.py (new, 4 tests): same shape as the Anthropic tests, with _ensure_client as the fast-path sentinel.

Test plan

6 new OpenAI-compat tests pass (1 mutation-verified load-bearing)
4 new Minimax tests pass
15 total provider abort tests pass (5 Anthropic + 4 Minimax + 6 OpenAI-compat)
4419 broader related tests pass
Critic subagent review: APPROVE (after fixing a stale docstring on this round and a non-load-bearing in-loop test on the prior round)
Manual: kick off a long prompt on the LiteLLM-proxied stack, press ESC during the model's "Thinking…" phase, should unwind in ~50ms instead of 10s

Follow-up (deferred)

Three-way duplication of the response-close-listener pattern across AnthropicProvider, MinimaxProvider, and OpenAICompatibleProvider. The three contexts differ enough (Anthropic has the watchdog + non-streaming fallback, OpenAI-compat has bare iterator + in-loop check, Minimax has with-block + get_final_message) that premature extraction would either grow the helper to a 4-knob API or leak abstraction. Will file as a separate refactor PR.

🤖 Generated with Claude Code

…viders PR #144 added abort-signal-aware streaming for ``AnthropicProvider``, but ``OpenAICompatibleProvider`` and ``MinimaxProvider`` got only the pre-call fast-path. Users running through LiteLLM / GLM / OpenAI / DeepSeek — the most common "OpenAI-compatible proxy → Claude" stack — still saw ESC wait the full model latency before the post-API abort check fired. Same 20+ second symptom from before #144. Port the response-close listener pattern from #144's AnthropicProvider: * Register a listener on the abort signal that calls ``stream.response.close()`` to close the underlying HTTP socket. Closes interrupt the SDK's blocking next-chunk read so the iterator raises immediately, even when the model is in a multi-second gap between chunks (extended thinking, tool_use generation). * For OpenAI-compatible providers, additionally add an in-loop ``if abort_signal.aborted: break`` check at the top of each ``for chunk in stream`` iteration. Covers the case where chunks arrive back-to-back fast enough that the listener's close lands one iteration late, or where the SDK has already prefetched chunks past the close point. * Signal-state-authoritative exception translation in the ``except Exception`` block — different SDK versions raise different exception classes when the response is closed mid-read, so the signal is the only stable abort indicator. * Register-then-recheck ordering closes the sub-microsecond race where ``_fire`` can snapshot the listener list and silently drop a freshly-appended listener. * ``finally`` block detaches the listener so long-lived controllers (the REPL engine's, reused across many turns) don't accumulate dead listeners. Minimax wraps the anthropic SDK against its compatible endpoint, so it gets the AnthropicProvider treatment (no in-loop check — the ``with client.messages.stream(...) as stream:`` pattern only exposes ``text_stream``, not a generic iterator). Ten regression tests pin the contract: * ``test_openai_compat_abort_signal.py`` (6 tests) — pre-abort fast-path with leaf-level ``assert_not_called()``, mid-stream close via response.close + timing bound, **load-bearing** in-loop check (asserts ``on_text_chunk`` saw only "first" not "second" — mutation-verified by deleting the in-loop check and watching the test fail), normal-completion regression check, ``abort_signal=None`` legacy parity, listener detachment. * ``test_minimax_abort_signal.py`` (4 tests) — same shape as AnthropicProvider, with ``_ensure_client`` as the fast-path sentinel. Three-way duplication of the close-listener pattern (Anthropic, Minimax, OpenAI-compat) is acknowledged. Extracting a shared helper is left as a follow-up — the three providers' surrounding contexts differ enough (Anthropic has the watchdog + non-streaming fallback, OpenAI-compat has bare-iterator semantics, Minimax has the ``with``-block + ``get_final_message``) that a premature extraction would either grow the helper to a 4-knob API or leak abstraction. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

ericleepi314 mentioned this pull request May 15, 2026

refactor(providers): extract StreamAbortGuard helper for streaming cancel #146

Merged

5 tasks

ericleepi314 merged commit 0f5212a into main May 15, 2026

ericleepi314 mentioned this pull request May 16, 2026

fix(esc): worker-thread iteration for OpenAI-compat streaming (LiteLLM hang fix) #148

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(esc): mid-stream cancellation for OpenAI-compatible + Minimax providers#145

fix(esc): mid-stream cancellation for OpenAI-compatible + Minimax providers#145
ericleepi314 merged 1 commit into
mainfrom
fix/esc-cancel-openai-compatible

ericleepi314 commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ericleepi314 commented May 15, 2026

Summary

User-reported symptom

Changes

Test plan

Follow-up (deferred)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant